Pthread Parallel K-means

نویسنده

  • Barbara Hohlt
چکیده

K-means is a popular non-hierarchical method for clustering large datasets. The time requirements increase linearly with the size of the data set which make it particulary suited for extremely large datasets such as those found in digital libraries. The method was developed by MacQueen [4] in 1967. In our project we take a uniprocessor k-means algorithm and implement a parallel k-means algorithm using pthreads. The algorithm we use is from the Normalized Cuts (Ncuts) [6] codebase of the Vision Group at UC Berkeley. The rest of this paper is organized as follows: We start by providing information about Ncuts and kmeans. In Section 3.2, we describe our software testbed, and present our design and implementation of the parallel k-means algorithm. Then in Section 3.3 we show our performance measurements and results. We conclude and discuss our plans for future work in Section 4.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Comparative analysis of multi-threading on different operating systems applied on digital image processing

This work presents a comparative analysis of parallel image convolution implementations based on the shared-variable programming model. Those implementations use explicit compiler directives from multi-thread support libraries. The comparison between implementations was done in Windows and Linux operating systems. It considered both performance and programmability. The performance was analyzed ...

متن کامل

Parallel Homologous Search with Hirschberg Algorithm: A Hybrid MPI-Pthreads Solution

In this paper, we apply two different parallel programming model, the message passing model using Message Passing Interface (MPI) and the multithreaded model using Pthreads, to protein sequence homologous search. The protein sequence homologous search uses Hirschberg algorithm for the pair-wise sequence alignment. The performance of the homologous search using the MPI-Pthread is compared to the...

متن کامل

Using Pthreads in Fortran

This article describes the way to use pthreads library in Fortran programs. With most of the modern day processors having more and more built in capability of parallelism, there is ample need of utilizing this power at the application level, especially in scientific applications which involves lots of number crunching and multi GB disk handling. To utilize the power of multi threading for scien...

متن کامل

The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering

Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001